Markov Decision Models with Weighted Discounted Criteria
نویسندگان
چکیده
We consider a discrete time Markov Decision Process with innnite horizon. The criterion to be maximized is the sum of a number of standard discounted rewards, each with a diierent discount factor. Situations in which such criteria arise include modeling investments, production, modeling projects of diierent durations and systems with multiple criteria, and some axiomatic formulations of multi-attribute preference theory. We show that for this criterion for some positive there need not exist an-optimal (randomized) stationary strategy, even when the state and action sets are nite. However,-optimal Markov (non-randomized) strategies and optimal Markov strategies exist under weak conditions. We exhibit-optimal Markov strategies which are stationary from some time onward. When both state and action spaces are nite, there exists an optimal Markov strategy with this property. We provide an explicit algorithm for the computation of such strategies and give a description of the set of optimal strategies.
منابع مشابه
Constrained Markov Decision Models with Weighted Discounted Rewards
This paper deals with constrained optimization of Markov Decision Processes. Both objective function and constraints are sums of standard discounted rewards, but each with a diierent discount factor. Such models arise, e.g. in production and in applications involving multiple time scales. We prove that if a feasible policy exists, then there exists an optimal policy which is (i) stationary (non...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملSecond Order Optimality in Transient and Discounted Markov Decision Chains
Abstract. The article is devoted to second order optimality in Markov decision processes. Attention is primarily focused on the reward variance for discounted models and undiscounted transient models (i.e. where the spectral radius of the transition probability matrix is less then unity). Considering the second order optimality criteria means that in the class of policies maximizing (or minimiz...
متن کاملChapter for MARKOV DECISION PROCESSES
Mixed criteria are linear combinations of standard criteria which cannot be represented as standard criteria. Linear combinations of total discounted and average rewards as well as linear combinations of total discounted rewards are examples of mixed criteria. We discuss the structure of optimal policies and algorithms for their computation for problems with and without constraints.
متن کاملWeighted Discounted Stochastic Games with Perfect Information
We consider a two-person zero-sum stochastic game with an innnite time horizon. The payoo is a linear combination of expected total discounted rewards with diierent discount factors. For a model with a countable state space and compact action sets, we characterize the set of persistently optimal (sub-game perfect) policies. For a model with nite state and action sets and with perfect informatio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Math. Oper. Res.
دوره 19 شماره
صفحات -
تاریخ انتشار 1994